Presentation: Tweet"Scalable Data Science and Deep Learning with H2O"
H2O is fast scalable open-source machine learning and deep learning for Smarter Applications. Using in-memory compression techniques, H2O can handle billions of data rows in-memory — even on small compute clusters. The platform includes interfaces for R, Python, Scala, Java, JS and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. H2O was built alongside (and on top of) both Hadoop and Spark clusters and is deployed within minutes. Sparkling Water combines the flexibility of Spark with the speed and accuracy of H2O's Machine Learning solution.
In this talk, we explain H2O's scalable in-memory architecture and design principles and outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. We will cover data ingest, feature engineering, model tuning, model validation and model selection; and how to take models into production. Live demos will be run on distributed systems. By the end of this presentation, you will know how to create your own machine learning models on your data using R, Python (iPython Notebooks) or Flow.
Download slides